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DISCLAIMER 


The  views,  opinions,  and/or  findings  contained  in  this  report  are  those 
of  the  authors  and  should  not  be  construed  as  an  official  Department  of  the 
Army  position,  policy  or  decision,  unless  so  designated  by  other  official 
documentat ion . 


I .  INTRODUCTION 


The  objective  of  this  research  was  to  investigate  the  concept,  design 
and  performance  analysis  of  a  computing  structure  to  solve  linear  ordinary 
differential  equations.  Previous  efforts  in  this  area  have  been  classified  as 
Digital  Differential  Analyzers.  ~  This  research  is  not  an  attempt  to  extend  this 
prior  effort,  but  rather  to  develop  an  entirely  new  approach  [1-14].  *This  re¬ 
port  summarizes  the  results  of  the  research  in  four  parts;  the  concept  for 
a  digital  computing  structure,  (2-)*  the  design  of  the  computing  structure, 

C3)^the  performance  of  the  computing  structure  and,  (4)  conclusions  and  recommend¬ 
ations  for  further  research.  - 


II.  CONCEPT 


A.  General 


Many  physical  systems  can  be  modeled  by  linear  ordinary  differential  equa¬ 
tions  of  order  N.  This  equation  can  be  solved  using  an  analog  computer  or  a 
general  purpose  digital  computer.  The  analog  computer  has  a  speed  advantage 
since  it  solves  the  problem  using  parallel  computing  elements.  The  digital 
computer  yields  a  more  accurate  solution,  and  eliminates  any  scaling  problems 
by  using  floating  point  arithmetic.  However,  the  digital  computer  is  slower 
than  the  analog  computer,  and  in  most  cases,  is  more  difficult  for  the  user  to 
access  [15].  The  research  goal  was  to  design  and  build  a  digital  computing 
structure  which  encompasses  the  advantages  of  both  these  computers  (except  for 
floating  point  arithmetic)  and  avoids  the  difficulties  of  each. 

B.  Mathematical  Forms  and  Solutions 

A  third  order  linear  ordinary  differential  equation  can  be  expressed  as 
■  •  •  •  ••  • 
y(t)  +  a2y(t)  +  a^yCt)  +  aQy(t)  =  b2X(t)  +  b1X(t)  +  bQX(t) 

•  •  •  (1) 
y(o)  =  C2,  y (o)  -  C^,  y (o)  *  CQ 


If  any  of  the  coefficients  a^  or  are  functions  of  t,  this  equation  is 
time  varying.  The  following  does  not  treat  this  case,  but  can  be  extended  to 
do  so.  In  order  to  solve  (1)  using  an  analog  computer,  it  is  necessary  to  pro¬ 
gram  the  computer  to  connect  the  hardware  as  shown  in  Figure  1.  This  structure 
will  then  yield  y(t)  as  a  function  of  the  input  X(t) .  It  is  important  to  note 
that  the  hardware  elements  operate  in  parallel  resulting  in  high  computation 
speed  for  the  analog  computer.  It  should  also  be  noted  that  the  third  order 
differential  equation  in  (1)  and  the  complementary  solution  structure  in  Figure  1 
can  be  extended  to  a  higher  order  system.  Since  this  does  not  add  to  this  dis¬ 
cussion,  the  extension  is  not  carried  out  in  this  report. 

A  common  transformation  to  convert  (1)  to  a  set  of  first  order  differential 
equations  is  [16] 


z3(t) 

““  *■ 

0  1  0 

z:(t) 

—  -> 

0 

z2(t) 

= 

0  0  1 

z2(t) 

+ 

0 

z3(t) 

_ "a0  -al  -a2_ 

N 

U> 

y-“N 

rr 

1 

Y(t) 

= 

-  b2  bl  b0  ] 

z2(t) 

z3(t) 


[°] 


X(t) 


(2) 


X(t) 


This  set  of  differential  equations  can  be  solved  using  the  structure  shown  in 
Figure  2.  In  general,  the  integrators  would  be  digital  processors  and  the  multi¬ 
plier  and  adder  blocks  would  also  be  digital  hardware.  This  structure  in 
Figure  2  is  rather  straightforward  for  a  third  order  differential  equation.  How¬ 
ever,  as  the  order  increases,  the  bottom  row  of  a^  coefficients  in  (2)  will  also 
increase.  This  implies  a  larger  collection  of  multipliers  and  adders  for  the 
implementation  in  Figure  2,  which  also  increases  the  solution  time. 
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An  alternate  transformation  on  (1)  will  yield 
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with  a  solution  structure  as  shown  in  Figure  3.  This  is  the  form  selected  for 
SPOCK  since  the  multiplier-adder  group  can  be  built  with  a  set  of  modules  where 
each  module  has  two  multipliers  and  two  adders.  This  results  in  a  higher  perform¬ 
ance  machine  and  some  savings  in  hardware.  The  design  of  the  SPOCK  structure, 
based  on  this  mathematical  formulation  of  the  differential  equation,  will  be 
given  in  the  next  section. 


III.  DESIGN  OF  THE  COMPUTING  STRUCTURE 

A.  Architecture 

1.  Organization:  The  structure  for  SPOCK  I  is  shown  in  Figure  4.  There 

are  five  modules;  namely,  (1)  the  processor,  (2)  input  bus  interface,  (3)  output 
bus  interface,  (4)  function,  and  (5)  coefficient  register  modules.  In  addition, 
a  host  computer  is  used  to  initialize  the  units,  load  the  necessary  routines, 
and  monitor  the  results.  A  bus  controller  is  shown  but  has  not  been  implemented 
in  hardware.  This  device  is  primarily  a  sequencer  and  controls  the  gating  be¬ 
tween  all  units  and  the  busses. 

2.  Processor  Module:  Each  processor  is  a  16-bit,  fixed  point  computer  j 

constructed  from  the  Intel  3000  integrated  circuit  family  [17].  Each  processor  1 
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has  an  auxiliary  multiplier  (16  x  16)  to  enhance  performance  and  a  1024  word 

RAM  for  microinstruction  storage.  A  RAM  was  used  instead  of  a  ROM  to  enable 

the  loading  of  various  function  routines.  Thus,  each  processor  can  be  con¬ 
figured  as  any  specific  function  by  a  proper  choice  of  microinstructions.  All 
tests  were  carried  out  by  selecting  the  function  to  be  an  integrator  and  using 
various  numerical  schemes  to  approximate  the  desired  integration  function.  The 
processor  module  is  shown  in  Figure  5.  For  full  parallel  operation,  one  process 
or  module  is  needed  for  each  state  variable. 

3.  Function  Module:  The  function  module  performs  the  calculation 

F  =  AxB  +  CxD  +  E  (3) 

where  F  is  the  output  and  A,  B,  C,  D  and  E  are  digital  values  on  five  input 
busses.  Parallel  operation  requires  one  function  module  for  each  state  variable 

The  structure  for  the  function  module  is  shown  in  Figure  6. 

4.  Bus  Interface  Modules:  The  output  bus  interface  module  is  used  to 
latch  each  processor  output.  This  value  can  then  be  applied  to  any  one  of  N 
busses  by  selection  of  the  latch  control.  The  input  interface  module  is  used  to 
latch  the  outputs  from  the  function  units.  One  of  these  is  then  selected  as  the 
input  to  each  processor  using  the  latch  control  lines.  The  number  of  modules 
required  is  equal  to  the  number  of  processors.  The  bus  interface  modules  are 
shown  in  Figures  7  and  8. 

5.  Coefficient  Register  Module:  The  constant  or  coefficient  registers 
are  used  to  store  the  a^  and  b^  constants  in  the  state  equation  matrix.  These 
modules  are  register  latches  in  the  SPOCK  structure.  They  are  loaded  with  the 
appropriate  constant  when  the  problem  is  initialized.  The  output  of  each  latch 
is  applied  to  the  appropriate  bus  using  a  sequencer  to  control  the  latch  enable 
lines.  The  module  structure  is  shown  in  Figure  9. 
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If  equation  (1)  is  permitted  to  have  time  varying  coefficients,  it  will 
be  necessary  to  change  the  constant  registers  to  a  circular  queue.  The  values 
would  then  be  shifted  and  gated  in  much  the  same  way  the  current  configuration 
handles  the  constants.  The  change  would  require  loading  of  all  the  discrete 
points  when  the  problem  is  initialized,  and  control  information  to  perform  the 
shifting  operation. 

IV.  PERFORMANCE  ANALYSIS 


A.  Microprogram 

Each  of  the  nine  integration  schemes  given  in  Table  1  has  been  coded  into 
a  microprogram  for  the  SPOCK  processors.  These  integration  routines  have  com¬ 
plexities  which  are  reflected  in  the  program  size.  This  program  size  is  given 
in  Table  2  as  the  number  of  microinstructions  required  to  implement  the  routine. 

For  example,  twelve  microinstructions  are  required  to  implement  the  Modified- 
Euler  routine.  Thus,  each  integration  step  requires  the  execution  of  these 
twelve  instructions. 

The  various  integration  routines  execute  their  microinstructions  sequentially 
except  for  the  Parallel  Adams-Bashforth  Predictor/Corrector .  This  routine  is 
implemented  such  that  the  predictor  (5  microinstructions)  can  be  computed  in 
parallel  with  the  corrector  (10  microinstructions)  using  two  processors.  This  is 
the  only  "parallel"  integration  routine  which  was  implemented.  Performance 
results  will  reflect  this  parallel  versus  sequential  implementation. 

B.  Function  Evaluations 

The  evaluation  of  an  integration  step  requires  the  evaluation  of  the  function. 


f  (Yn’  X„> 

n  n 


to  obtain  the  new  value  Yn+^*  From  Table  1  it  can  be  seen  that  the  new  point, 

Y  ...  can  be  evaluated  for  any  of  the  first  seven  routines  if  prior  points  have 
n+1 

been  saved  and  Y^  is  available.  This  function  evaluation  requires  the  use  of 
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Table  1.  Numerical  Integration  Methods 


Euler 


Y  =  Y  +  TY 
n+1  n  n 


Modified  Euler 


Yn+i  =  yC  +  TYc 
n+1  n  n 


YC  =  yC  +  —  (YP  +  YC) 
n+1  *n  +  2  (Yn+l  +  V 


Adams-Bashforth  Two  Point  Predictor 


fn+i  =  Y  +  TY  +  i  (Y  -  Y  . ) 
n+1  n  n  2  n  n-1 


A  -  B  Two  Point  &  Trapezoidal  Corrector 

Yp  =  YC  +  TYC  +  1  (YC  -  YC  ,) 
n+1  n  n  2  n  n-1 
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Table  1.  Numerical  Integration  Methods 


Three  Point  Runge-Kutta 

Y  .  =  Y  +  T(y  Y  +0Y  1  +  3  Y  _J) 

n+l  n  4  n  n-hr  t  n+rr 
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•  1  T  T* 
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Four  Point  Runge-Kutta 
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. ,  =  y  +t(t-y  +  y  y  i  +  y  y  i  +  r  y  ) 
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n  n  n 
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Table  2.  Program  Characteristics  for  Numerical  Methods 


Method 

Microprogram 

Size 

No.  of  Function 
Evaluations 

Euler 

4 

1 

Modified  Euler 

12 

2 

2nd  A-B  Pred. 

8 

1 

2nd  A-B  P/C 

22 

2 

Parallel  P/C 

5(P) 

1 

10(C) 

1 

3rd  Adam's  Pred. 

14 

1 

3rd  Adam's  P/C 

27 

2 

3rd  Runge-Kutta 

18 

3 

4th  Runge-Kutta 

27 

4 

i 


the  function  network  to  solve  one  row  of  the  SPOCK  state  equation.  For  Euler's 


method,  one  pass  through  the  function  network  is  required  since  Y^  can  be  used 

with  a  stored  value  of  Y^  to  compute  Yn+^.  However,  for  the  Modif ied-Euler 

‘  .c 

method,  the  fun'tion  network  must  be  used  twice;  once  to  compute  Y  which  is  used 

p  •  p  c 

to  compute  Y^+  ,  and  Y^+^  which  can  be  used  to  generate  Y^+^.  The  Parallel 
Adams-Bashforth  P/C  is  again  a  special  case  in  that  the  two  function  evaluations 
can  be  done  in  parallel  by  using  two  hardware  units. 

The  Runge-Kutta  methods  differ  from  the  othei  methods  in  that  function 
values  cannot  be  stored  to  reduce  the  computation  at  each  step.  These  methods 
require  a  number  of  intermediate  values  in  going  from  point  nT  to  point  (n+l)T. 
Each  of  these  values  represents  a  function  evaluation  resulting  in  a  total  of 
three  for  the  third  order  method  and  four  for  the  fourth  order  method. 

The  required  number  of  function  evaluations  per  integration  step  is  given 
in  Table  2  for  each  of  the  nine  methods.  These  function  evaluations  impact 
performance  due  to  the  computation  delay.  For  most  methods,  this  is  a  sequential 
delay  which  is  added  to  the  integration  time.  For  the  Modif ied-Euler ,  the  delay 
can  be  broken  down  into  the  components  shown  in  Figure  10.  Assume  the  routine 

Q 

has  just  computed  Y  .  The  next  four  steps  will  be:  (1)  function  computation  for 

yC,  (2)  integration  step  for  yj^,  (3)  function  computation  for  y^+^,  and 

c 

(4)  integration  step  for  Yn+^*  Each  of  these  is  shown  in  Figure  10. 

C.  Performance  Equation 

1.  Definitions:  A  performance  equation  can  be  derived  based  on  the  delays 
discussed  in  the  two  previous  sections.  The  performance  is  related  to  the  solu¬ 
tion  of  a  state  vector  of  N  first  order  differential  equations.  It  is  first 
necessary  to  define  the  following  parameters: 
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Time  to  compute  one  integration  step  for  the  state  vector 


Dp  -  Function  Unit  Delay 

Dp  -  Processor  Delay  in  Computing  one  Integral  Step 

K.  -  Number  of  passes  through  the  function  unit  for  one  integration  step 

0^  -  Processor  Overlap  with  function  unit 

N  -  Number  of  state  equations 

I  -  Number  of  processor  modules 

F  -  Number  of  function  unit  modules. 

The  processor  delay.  Dp,  is  a  function  of  the  number  of  microinstructions  and 

the  time  it  takes  to  execute  each  instruction.  For  predictor-corrector  methods, 

this  delay  could  be  separated  into  the  predictor  delay,  Dpp,  and  the  corrector 

delay,  D  .  However,  when  these  two  are  added  the  result  is  D  . 

IC  X 

2.  Overlap  Time:  The  overlap  time,  Op,  represents  the  time  associated 
with  certain  integration  routines  which  can  be  hidden  by  doing  the  calculation 
in  parallel  with  the  function  unit.  For  example,  the  Adams  Three  Point  Predictor 
is  given  by 


n+1 


23T  ' 

=  Y  +  —  Y 
n  12  n 


4T  st  * 

—  Y  +  -—  Y 
3  n-1  12  n-2 


(6) 


While  the  function  unit  computes  Y  ,  the  processor  can  begin  the  computation 
based  on  the  terms  which  are  already  known.  In  this  case,  the  overlap  time  would 
be  equal  to  Dp  since  several  multiplies  and  adds  can  be  performed  before  Y^ 
is  needed.  It  should  be  noted  that 

Op  <  Dp  and  Op  <  Dp  (7) 

3.  Performance  Equations:  To  illustrate  the  development  of  the  performance 
equations  three  separate  cases  will  be  considered  using  the  Modif ied-Euler 
integration  technique.  In  the  first  case  N  =  I  =  F.  For  this  case,  Figure  10 
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indicates  the  sequence  for  one  integration  step.  This  sequence  and  the  res¬ 
pective  delays  are 


Step  1 
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yn 

df 

Step  2 

yp 

yn+l 

°IP 

Step  3 

yp 

yn+l 

°F 

Step  4 

c 

yn+l 

dic 

The  resulting 

;  time  is  then 

S  =  2  D 

L,  +  +  Dx_ 

(8) 

T 

F  IP  IC 

Extending  to 

the  general  solution  and  including  the  possibility  of  overlap  with 

the  function 

units  yields 

st  =  kdf 

+  <°i  -  V 

(9) 

For  this  particular  integration 

routine,  0^  would  be  zero. 

In  case 

two,  let  N  =  3,  I  = 

:  2,  F  =  3.  The  computation  sequence  and  the 

delays  are  then 

’c 

•  • 

c  c 

Step  1 

yl,n 

y2,n  y3,n 

df 

Step  2 

yP 

yl , n+1 

yP 

y2,n+l 

°IP 

Step  3 

yP 

y3,n+l 
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Step  4 

yl,  n+1 

'P  ’P 

y2 ,  n+1  y3,  n+1 

°F 

Step  5 

y 1 , n+1 

y 2 , n+1 

dic 

Step  6 

y3,  n+1 

DIC 

The  computation  time  for  the  state  vector  is 
STI  '  2DF  +  2DIP  +  2DIC 
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where  S^,^.  is  the  solution  time  for  the  multiplexed  processor  case. 
In  general,  this  equation  becomes 

STI  *  “f  +  Dl(Y| 

where  j~X~j  represents  the  smallest  integer  >X. 
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For  the  third  case,  let  N=3,  1=3,  F=2.  The  computation  sequence  and 


associated 

delays  are 
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The  computation  time  for  the  state  vector  is 


STF  “  4DF  +  °IP  +  DIC 


=  AD*.  +  Dj 


(12) 


where  STF  is  the  solution  time  for  the  multiplexed  function  unit  case.  In 
general,  this  equation  becomes 


STF-Kfl]  Df  +  BI 


(13) 


Equations  (9),  (11),  and  (13)  represent  the  three  cases  for  any  of  the 
integration  routines.  In  (11)  and  (13),  the  term  0^.  should  be  subtracted  as  in 
(9).  When  this  is  included,  the  multiplexed  times  can  be  normalized  to  give 


TI 


1  +  P 


1+P 
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(14) 
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and 


Other  forms  of  multiplexing  the  hardware  will  yield  relative  time  increases  of 


ST  -  STIF  -  N  ST 


D.  Error  Analysis 


1.  Truncation  Errors:  When  using  a  numerical  integration  routine,  a 
truncation  error  is  introduced.  This  error  is  related  to  the  difference  between 
the  given  numerical  method  and  the  Taylor  Series  for  the  expansion  of  a  function 
[18] .  Typically,  numerical  methods  will  agree  with  the  Taylor  Series  through  the 
first  few  terms.  The  first  term  which  doesn't  agree  can  be  used  to  establish  a 
bound  on  the  computation  error.  These  bounds  are  shown  in  Table  3  for  the  first 
seven  numerical  methods.  Runge-Kutta  truncation  errors  are  more  difficult  to 
express  in  this  compact  form.  An  alternate  error  technique  is  used  for  these 
methods,  called  Richardson's  Extrapolation  [19]. 

To  develop  these  error  expressions,  let  y^  and  yn+1  be  the  true  solutions 
at  steps  n  and  n+1.  In  order  to  go  from  step  n  to  n+1,  a  single  step,  h,  can  be 
used  once  or  a  step  of  h/2  can  be  used  twice.  The  solution  when  the  step  is  used 
twice  is  y ^  and  the  solution  when  the  step  is  used  once  is  y  .  These  solutions 
generate  errors  given  by 

-  n  =■ zr+1  c„ »r+1  ■  h  (25> 

-  »2  “  2  C„  h'+l  *  E2  <26) 

where  r  is  the  order  of  the  numerical  method.  When  is  eliminated,  the  result 


y  =  e. 


This  equation  can  be  used  for  any  of  the  numerical  methods,  but  is  particularly 


attractive  for  Runge-Kutta  algorithms. 


Table  3. 


Truncation  Errors  for  Numerical  Methods 


Method 


Euler 


Modified  Euler 


2nd  Adams  Bashforth 
Predictor 

2nd  Adams  Bashforth 

Predictor /Corrector 


Parallel  Predictor/ 
Corrector 

3rd  Adams  Predictor 


3rd  Adams  Predictor/ 
Corrector 


Truncation  Error 
nT  <e<  (n+l)T 


(e) 


(e) 


(e) 


(t) 


(e) 


(e) 


(e> 
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The  errors  for  third  and  fourth  order  Runge-Kutta  methods  are  then 


y2-yi 
errrk3  1  — 


y2~yl 

ERRRK4l^ 


(28) 


(29) 


where  ERR^^  and  ERR^^  represent  the  local  truncation  error  in  the  respective 
Runge-Kutta  methods. 


2.  Round-Off  Errors:  Another  source  of  error  is  due  to  finite  precision 
arithmetic.  This  error  is  called  round-off  [20].  In  the  SPOCK  computer,  all 
data  and  calculations  are  carried  out  in  16-bit,  twos  complement  arithmetic. 
One  of  the  objectives  of  the  research  was  to  establish  the  relative  importance 
of  round-off  error  compared  to  truncation  error. 

3.  Error  Curves:  In  order  to  get  some  performance  results,  a  particular 
equation  was  selected.  The  equation 


•  •  • 

y  (t)  +  2  (0.139999)  y(t)  +  y(t)  =  0 

y(0)  -  0,  y (0)  =  0.75 


(30) 


was  solved  on  SPOCK  using  all  nine  integration  methods.  True  values  to  (30) 
were  obtained  using  the  exact  solution  and  120-bit  floating  point  arithmetic. 

The  difference  between  these  two  solutions  gives  the  error  generated  by  SPOCK 
when  solving  (30).  This  set  of  error  curves  is  shown  in  Figure  11  as  a  function 
of  step  size.  The  error  which  is  plotted  is 

n  2 

ERRtot  -  AT  E  |Y120i  -  YSPOC^ |  (31) 

where  AT  is  the  step  size,  Y120^  represents  the  true  solution  at  step  i, 

YSPOCK^  is  the  SPOCK  solution  at  step  i,  and  ERRTQT  is  the  total  error.  The 
step  size  was  selected  by  setting  &T  *  2tt/  12*2^  where  i  =  0,  1,  2,  ..., 


7. 


In  order  to  separate  the  error  into  its  two  components,  equation  (30) 
was  solved  by  each  of  the  numerical  methods  using  120  bit  floating  point 
arithmetic.  The  exact  values  were  also  calculated  using  120  bits.  Roundoff 
error  in  this  case  was  assumed  to  be  negligible.  The  truncation  error  was 
calculated  by 

n  2 

ER^  =  AT  E  | Y120  -  YNUM  ]  (32) 

i=l 

where  AT  is  the  step  size. 

Y120^  is  the  true  value  at  step  i,  YNUM^  is  the  value  at  step  i  using  a 
particular  numerical  method  and  ERR^,Ry  is  the  truncation  error.  The  set  of 
error  curves  produced  by  (32)  is  shown  in  Figure  12. 

The  round-off  error  can  now  be  determined  by  forming: 

eerko  ■  errtot  -  errtru  <33> 

where  ERR  is  the  round-off  error  for  a  particular  numerical  method  using  the 
KU 

SPOCK  hardware.  These  error  curves  are  shown  in  Figure  13.  The  superiority 
of  the  higher  order  methods  can  be  made  more  specific  by  the  following  perform¬ 
ance  analysis. 

E.  Relative  Performance 

In  order  to  assess  the  capability  of  SPOCK,  the  following  hardware  para¬ 
meters  are  used: 

0X  -  0; 

Dj.  *»  (Number  Microinstructions)  x  250  nsec. 

D  *  500  nsec. 

r 

Using  the  time  parameters  given  above,  the  various  methods  have  a  solution  time 
per  step  as  indicated  in  Table  4.  For  example,  the  Modified-Euler  method  has  a 
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RELflT .  V£  ERROR 


O  EULER 

A  M00IF1E0  EULER 

♦  2ND  RORNr-BRSH.  PREO* 
X  2ND  BOflnS-ORSH.  P/C 

♦  PBRRLLEL  fWfif18-8ftSH. 

♦  sro  fiwmc  rniR  pred. 

X  SRO  RORHS  PAIR  P/C 
2  SRO  RUNGC-KUTTfi 
V  4TH  RUNOE-RUTTR 


id-*  to*1 

STEP  SIZE  ( RflOIflNS ) 


Figure  12.  ACCUMULATED  TRUNCATION  ERROR 


RELATIVE  ERROR 


10-4 


10-4 


10-4 


10-4 


10  "4 


10-^ 

. 

10-2 


10 


©  EULER 

A  H0OIFIEO  EULER 
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X  3RD  ROAMS  PAIR  P/C 

Z  3RD  RUNOE-KUTTA 
Y  4TH  RUNOE-KUTTA 
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— r  tii — r  f  tri - 1 

10  "* 

SIZE  (RROIANS) 


r  1  1  1  1  r 


10 


-* 


Figure  13.  ACCUMULATED  ROUND-OFF  ERROR 
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solution  time  per  step  of: 


ST  =  KDf  +  Dj  -  0l 


(2)  (500)  +  (12)  (250) 
4000  nsec . 


The  error  curves  can  be  used  to  pick  the  maximum  step  size  for  each  numer¬ 
ical  method.  In  order  to  set  the  step  size,  the  absolute  error  as  a  percent  of 
the  maximum  value  (1.0  for  the  SPOCK  machine)  was  used.  A  typical  error  curve 
is  shown  in  Figure  14  for  the  Modif ied-Euler  numerical  method  with  T  =  0.1309. 

This  shows  the  maximum  error  to  be  approximately  0.6%.  When  T  =  0.262  for  this 
numerical  method,  the  maximum  error  exceeds  1%.  Thus,  for  the  performance 
analysis,  the  maximum  step  size  for  the  Modif ied-Euler  method  is  set  at  T  =  0.1309. 
A  similar  analysis  for  the  other  routines  leads  to  the  maximum  step  sizes  given 
in  Table  4. 

The  maximum  step  size  for  the  Modif ied-Euler  method  is  T  =  0.1309  radians. 

max 

Thus,  the  maximum  frequency  using  the  Modif ied-Euler  method  is 


f  =  0.1309  x  10 
max  2tt  x  4.0 


5208  hz. 


A  similar  calculation  can  be  carried  out  for  each  method  to  give  the  upper  fre¬ 
quency  limit  shown  in  Table  4.  It  should  be  emphasized  that  this  limit  also 
includes  the  requirement  that  the  maximum  absolute  error  will  be  less  than  1% 
over  the  solution  range. 

F.  Stability  and  Convergence 

The  error  in  a  numerical  method  is  related  to  the  stability  and  convergence 
of  the  method.  It  is  generally  reported  in  the  literature  that  stable  methods 
give  good  solutions  [21,  22].  A  more  accurate  statement  is  that  a  stable  method 
gives  a  solution  which  converges  to  the  true  solution  for  large  values  of  the 
independent  variable. 
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SPOCK  I  ABSOLUTE  ERROR  USING  MODIFIED-EULER  INTEGRATION  METHOD 


Table  4. 

SPOCK  Performance 

Analysis 

Method 

Time 

per 

Step  (ysec.) 

T 

max 

(Radians) 

Maximum 

Frequency 

(Hz) 

Euler 

1.5 

0.0041 

434 

Modified  Euler 

4.0 

0.131 

5208 

2nd  A-B  Predictor 

2.5 

0.0327 

2083 

2nd  A-B  Predictor/ 
Corrector 

6.5 

0.131 

3205 

Parallel  P/C 

3.0 

0.0163 

868 

3rd  Adams  Predictor 

4.0 

0.131 

5208 

3rd  Adams  Predictor/ 
Corrector 

7.75 

0.262 

5376 

3rd  Runge-Kutta 

6.0 

0.262 

6944 

4th  Runge-Kutta 

8.75 

0.524 

9524 
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For  example,  assume  it  is  desired  to  solve  the  equation 

y(t)  +  y(t)  =  10u(t) ,  Y (0)  =  0  (36) 

using  Euler's  method.  Substituting  for  y(t)  gives 


^n+1  ^n  +  y  =  10u(n),  Y(0)  =  0 
T  n 

where  T  is  the  integration  step  size.  Transforming  to  the  Z-domain  yields 

lOTz 


(37) 


zY (z)  *  -T  y(z)  +  y (z)  + 
y(z)  [  z  +  T  -  1  ]  = 


z  —  1 
lOTz 


z-1 


y(z) 


lOTz 


(z-1)  (z+T-1) 
lOTz 


(38) 

(39) 

(40) 

(41) 


(z-1)  (z  -  ( 1-T) ) 

The  solution  for  Y(z)  is  stable  provided  the  roots  of  the  denominator  lie  within 
the  unit  circle  (the  point  Z  «  1  is  a  special  case).  Thus,  for  stability 


0  <  T  <  2.  (42) 

The  solution  to  (41)  is 

y?nT)  =10  [1  -  (1-T)n]  (43) 

The  true  solution  to  (36)  is 

y(t)  =  10  (l-e-t),  t  >  0  (44) 


If  this  solution  is  examined  at  discrete  points,  the  values  are 

y (nT)  -  10  (l-e-nT)  ,  n  >  0  (44) 

The  error  at  discrete  points  is 

ERR0Rn  -  |y(nT)  -  y (nT) j  =  j 10  (1-T)n  -10e"nT(  (46) 

Note  that  lie  T  -O  for  the  error  is  zero  at  all  finite  values  of  nT.  But  also 
note  that  when  T  j*  0,  errors  exist  at  all  points  t  =  nT.  A"  important  concept 
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is  the  relative  error  [23]  defined  as 


REL  ERROR 


REL  ERROR 


ERROR 


n' 


I  (1-T)n  -e~nT I 


y(nT) 


1-e 


-nT. 


/■.  „,.n  — nT 

(1-T)  -e 


1-e 


■nT 


For  distinct  values  of  T  the  following  relative  errors  exist 

<  1  ;  n  >  0 


T  =  0.1  REL  ERROR 


T  =  0.5  I REL  ERROR 


T  =  1.0  REL  ERROR 


T  =  1.5  REL  ERROR 


T  =  2.0  REL  ERROR 


( . 9)n  -  e~‘ ln 


- .  ln 


1  -e 


,  RXn  -. 5n 
(.5)  -e 


1  -e 

-n 

-e 


-  .5n 


-n 


1-e 


/  -1 . 5n 

(•5)  -e _ 


..  -1.5n 

1-e 

f  ,^n  -2n 

(-1)  -e 


1-e 


-2n 


<  1  ;  n  >  0 

<  1  ;  n  >  0 

<  1  ;  n  >  0 

>  1  ;  n  >  0 


(47) 

(48) 


These  values  indicate  for  values  of  T  except  T  =  2  the  relative  error  is  less 
than  one.  This  implies  the  approximate  solution  is  "close"  to  the  true  solu¬ 
tion  or  converges  to  the  true  solution.  It  is  evident,  however,  that  for  T  =  0.5 
an  error  exists  for  all  finite  values  of  n.  As  n  -*•  00  in  this  case,  the  relative 
error  -*-0.  A  similar  situation  exists  for  T  =  0.1,  T  =  1.0,  T  =  1.5,  and  in 
general  all  T  which  yield  a  stable  solution.  Hence,  stability  implies  a  solution 
that  converges  to  the  true  solution  as  t  -*■  00 .  It  does  not  imply  an  accurate 
solution  for  finite  values  of  t  =  nT. 

Accuracy  requires  an  examination  of  the  relative  error.  For  example,  con¬ 
sider  the  two  cases  T  =  0.1  and  T  =  1.0,  both  of  which  give  stable  solutions. 
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The  relative  errors  for  these  two  cases  at  the  same  time  points  are 


|rel  error 

(T  =  0.1)| 

l(.9)10n- 
|  l-e”n 

■e“n.l 

> 

n  =  1,  2, 

|rel  error 

(T  =  1.0)  | 

=  1  -e“n 

1  I-'" 

_L 

> 

n  =  1,  2,  . . . 

which  shows 

|rel  error 

(T  =  0.1) 1 

<  |rel  error 

(T  = 

1.0) 

Hence,  the  choice  of  T  =  0.1  produces  a  more  accurate  solution  at  each  point 
t  =  nT  than  the  choice  of  T  =  1.0. 

In  general,  stability  of  the  solution  is  not  the  driving  requirement  on 
the  choice  of  step  size.  Accuracy  at  each  step  will  usually  dictate  a  much 
smaller  step  size  than  the  stability  requirement.  If  the  solutions  to  (36) 
using  T  =  0.5,  1.0  and  1.5  are  compared  to  the  true  solution  as  in  Figure  15, 
it  is  obvious  that  certain  solutions  are  very  inaccurate.  These  solutions  are 
stable  and  will  converge  to  the  exact  solution  for  large  values  of  nT.  However, 
in  simulation  work,  this  is  seldom  the  case  of  interest.  The  performance  analysis 
section  used  a  more  conventional  accuracy  requirement  based  on  the  error  as  a 
percent  of  the  maximum  value.  This  is  the  typical  performance  specification 
used  by  analog/hybrid  computers  [24].  Further,  the  error  limit  of  1%  of  the  max¬ 
imum  value  is  also  consistent  with  analog/hybrid  hardware. 
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V.  CONCLUSIONS  AND  RECOMMENDATIONS 


A  digital  structure  has  been  designed,  built  and  tested.  The  results  show 
that  a  digital  simulation  machine  can  be  an  effective  computing  structure  pro¬ 
vided  higher  order  numerical  methods  can  be  implemented.  In  general,  the  pre¬ 
liminary  conclusions  can  be  stated  as: 

1.  The  computing  structure  is  simple  and  only  requires  five  distinct 
modules . 

2.  The  number  of  busses  is  proportional  to  the  order  of  the  differ¬ 
ential  equations. 

3.  For  a  general  structure,  the  interconnection  of  the  busses  and 
device  modules  is  the  most  complicated  aspect  of  the  structure. 

4.  The  processor  modules  require  only  elementary  software  to  con¬ 
figure  them  as  a  specific  integrator. 

5.  The  sequencing  of  the  structure  to  carry  out  the  solution  of  a 
set  of  state  equations  is  simple.  It  has  been  done  in  software 
with  only  a  few  instructions,  but  can  be  done  with  hardware  to 
improve  the  speed.  The  hardware  solution  is  about  the  same  complexity 
as  one  of  the  processor  modules. 

6.  One  typical  system  equation  has  been  solved  and  evaluated.  In 
general,  higher  order  numerical  methods  give  superior  performance. 

For  these  methods  16-bit  arithmetic  is  more  accurate  than  the 
associated  truncation  errors. 

7.  Performance  using  less  than  state  of  the  art  digital  technology 
will  give  real-time  solutions  of  equations  with  frequency  components 
up  to  10  KHZ.  The  accuracy  of  the  solutions  will  be  approximately 
1%  of  the  maximum  range.  Improved  technology  could  easily  raise  the 
frequency  limit. 


SPOCK  I  represents  only  the  first  step  in  a  novel  application  of  computer 
architecture  to  a  special  computing  problem.  The  design  was  limited  in  scope 
and  did  not  attempt  to  cover  all  classes  of  problems  which  could  possibly  be 
solved  by  such  a  structure.  The  following  list  presents  a  number  of  topics 
for  further  study  and  work  to  extend  the  field  of  knowledge  in  this  area. 

1.  Software  translator  for  automatic  program  set-up  and  sequencing. 

2.  Automatic  scaling  when  variables  exceed  the  overflow  limit. 

3.  Extension  to  include  time  varying  coefficients  and  nonlinear  functions. 

4.  Extension  to  include  a  user  interface  for  direct  entry  of  an  equation 
and  a  display  of  the  solution. 

5.  Additional  study  on  truncation  errors,  round-off  errors  and  total  error 
to  ascertain  the  effect  of  additional  poles  or  zeroes  on  these  error  com¬ 
ponents.  The  work  thus  far  has  concentrated  on  a  dominant  pair  of  complex 
poles  in  the  differential  equation.  It  is  believed,  though  not  verified, 
that  this  pair  will  set  the  limits  on  the  maximum  step  size. 

6.  Additional  study  on  performance  improvements  via  alternate  structures, 
custom  hardware  chips  and  different  bussing  concepts. 
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